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ABSTRACT 

In arXiv:091 1.2150 Rutger van Haasteren seeks to criticize the nested sampling algorithm 
for Bayesian data analysis in general and its MultiNest implementation in particular. He 
introduces a new method for evidence evaluation based on the idea of Voronoi tessellation 
and requiring samples from the posterior distribution obtained through MCMC based meth- 
ods. He compares its accuracy and efficiency with MultiNest, concluding that it outper- 
forms MultiNest in several cases. This comparison is completely unfair since the proposed 
method can not perform the complete Bayesian data analysis including posterior exploration 
and evidence evaluation on its own while MultiNest allows one to perform Bayesian data 
analysis end to end. Furthermore, their criticism of nested sampling (and in turn Multi- 
Nest) is based on a few conceptual misunderstandings of the algorithm. Here we seek to set 
the record straight. 
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1 INTRODUCTION 



In a recent paper ( Ivan Haasterenll2009l) . Rutger van Haasteren has 
criticized the MultiNest algorithm for Bayesian analysis and 
suggested another method to calculate the Bayesian evidence with 
the claim that it significantly outperforms MultiNest both in 
terms of accuracy and computational accuracy. Our aim in this short 
note is to highlight the conceptual misunderstandings and the false 
premise on which the comparison is based. 

The outline of the paper is as follows. In Sec. [2] we give an 
introduction to Bayesian inference and describe the nested sam- 
pling algorithm and its implementation in MultiNest package in 
Sec. [3] In Sec. [4] we give an acco unt of the conceptual misunder- 
standings in Ivan Haasteren 2009 which are the basis of the attack 
mounted by the author on MultiNest. Finally, our conclusions 
are presented in Sec. [5] 



2 BAYESIAN INFERENCE 

Bayesian inference methods provide a consistent approach both to 
the estimation of a set of parameters in a model (or hypothe- 
sis) H for the data D and to the ev aluation of th e relative merits 
of different models for the data (see iTrottal J2008h for a review of 
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Bayesian methods in cosmology and astrophysics). Bayes' theorem 
states that 



Pr(0|D,#) = 



Pr(D| &,H)Pr(&\H) 
Pr(D|if) ' 



(1) 



where Pr(0jD, H) = P(0) is the posterior probability distribu- 
tion of the parameters, Pr(Dj©,f/) = C(&) is the likelihood, 
Pr(®\H) = 7r(0) is the prior distribution, and Pr(D|H) = Z is 
the Bayesian evidence. 

Bayesian evidence is the factor required to normalise the pos- 
terior over 0: 



Z = / £(0)7r(0)d D 0, 



(2) 



where D is the dimensionality of the parameter space. Since the 
Bayesian evidence is independent of the parameter values 0, it is 
usually ignored in parameter estimation problems and the posterior 
inferences are obtained by exploring the un-normalized posterior 
using standard MCMC sampling methods. 

Bayesian parameter estimation has been used quite exten- 
sively in a variety of astronomical applications, including gravita- 
tional wave astronomy, although standard MCMC methods, such as 
the basic Metropolis-Ha stings algorith m or the Hamiltonian sam- 
pling technique (see e.g. iMackav 2003), can experience problems 
in sampling efficiently from a multi-modal posterior distribution or 
one with large (curving) degeneracies between parameters. More- 
over, MCMC methods often require careful tuning of the proposal 
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Figure 1. Cartoon illustrating (a) the posterior of a two dimensional prob- 
lem; and (b) the transformed C(X) function where the prior volumes X; 
are associated with each likelihood d . 



distribution to sample efficiently, and testing for convergence can 
be problematic. 

In order to select between two models Ho and Hi one needs to 
compare their respective posterior probabilities given the observed 
data set D, as follows: 



Pr(Hi|D) _ Pr(D|#i) Pr(#i) _ Zi Pr(ifi) 



Pr(tf |D) Pr(D|tf )Pr(#o) Z Pt(H ) 



(3) 



where Pt(Hi)/ Pt(Ho) is the prior probability ratio for the two 
models, which can often be set to uni ty but occasionally requi res 
further consideration (see, for example. lFeroz et al .2009el l2008l for 
the cases where the prior probability ratio should not be set to unity) 
and the ratio of the evidences ^ is called the Bayes factor between 
the two models. It can be seen from Eq. l[3} that the Bayesian evi- 
dence plays a central role in Bayesian model selection. As the aver- 
age of likelihood over the prior, the evidence automatically imple- 
ments Occam's razor: a simpler theory which agrees well enough 
with the empirical evidence is preferred. A more complicated the- 
ory will only have a higher evidence if it is si gnificantly bett e r at ex- 
plaining the data than a simpler theory (e.g. jLiddlel d2004l) : pTrott j 
J2OQM). 

The evaluation of the Bayesian evidence involves the multi- 
dimensional integral (Eq. [2} and thus presents a challenging nu- 
merical task. Standard technique s like thermodynamic integration 
16 Ruanaidh & Fitzgerald! { 19961) are usually extremely computa- 
tionally expensive, which makes evidence evaluation typically at 
least an order of magnitude more costly than parameter estima- 
tion. Recently, a estimation of the evidence using population Monte 
Carlo techniques has been successfully implemented in the cos- 
mological context iKilbingeretal] {2009). Some fast approximate 
methods have been used for evidence evaluation, such as treating 
the poste rior as a multivariate Gaussian centred at its peak (see, for 
example, Hobson et al. 2002), but this approximation is clearly a 
poor one for highly non-Gaussian and multi-modal posteriors. A 
computation ally cheap and a ccurate method is the S avage-Dickey 
density ratio Trotta ( 2007a. b); Var danvan et al.l J2009I) . which how- 
ever can only be used to compute the Bayes factor between nested 
models. Various alterna t ive information crite ria for model selection 
are discussed in iLiddld d2004l) : lLiddld J2OO7I) . but the evidence re- 
mains the preferred method. 



3 NESTED SAMPLING AND THE MULTINEST 
ALGORITHM 

In this section we briefly review the Nested sampling algorithm 
a nd the MULTIN E ST implementation. Furt h er details can be fou nd 
in lSkillingl l2004l) : lFeroz & Hobsonl l f2008l) : lFeroz et al.l < l2009ch . 



3.1 Nested Sampling 

Nested sampling ISkillind (2004) is a Monte Carlo method target- 
ted at the efficient calculation of the evidence, but also produces 
posterior inferences as a by-product. It calculates the evidence by 
transforming the multi-dimensional evidence integral into a one- 
dimensional integral that is easy to evaluate numerically. This is ac- 
complished by defining the prior volume X as dX — ir(®)d D ®, 
so that 



X(X) = [ n(G)d D &, 

J£(&)>\ 



(4) 



where the integral extends over the region(s) of parameter space 
contained within the iso-likelihood contour C(&) = A. The evi- 
dence integral, Eq. l|2}, can then be written as 



Z = 



C(X)dX, 



(5) 



where C(X), the inverse of Eq. ((4), is a monotonically decreas- 
ing function of X. Thus, if one can evaluate the likelihoods d = 
C(Xi), where X; is a sequence of decreasing values, 



< X u < ■ ■ ■ < X 2 < Xi < X = 1, 



(6) 



as shown schematically in Fig.Q] the evidence can be approximated 
numerically using standard quadrature methods as a weighted sum 



Z 
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(7) 



where the weights Wi for the simple trapezium rule are given by 
Wi = |(Jfj_i — Xj+i). An example of a posterior in two dimen- 
sions and its associated function C(X) is shown in Fig.[T] 

The summation in Eq. 10 is performed as follows. The iter- 
ation counter is first set to i = and N 'active' (or 'live') sam- 
ples are drawn from the full prior n(&), so the initial prior volume 
is Xq = 1. The samples are then sorted in order of their likeli- 
hood and the smallest (with likelihood Co) is removed from the ac- 
tive set (hence becoming 'inactive') and replaced by a point drawn 
from the prior subject to the constraint that the point has a likeli- 
hood C > Co- The corresponding prior volume contained within 
the iso-likelihood contour associated with the new live point will 
be a random variable given by X\ — t±Xo, where t± follows the 
distribution Pr(t) = TV^ -1 (i.e., the probability distribution for 
the largest of N samples drawn uniformly from the interval [0, 1]). 
At each subsequent iteration i, the removal of the lowest likelihood 
point d in the active set, the drawing of a replacement with C > C\ 
and the reduction of the corresponding prior volume Xi — tiXi-i 
are repeated, until the entire prior volume has been traversed. The 
algorithm thus travels through nested shells of likelihood as the 
prior volume is reduced. The mean and standard deviation of log t, 
which dominates the geometrical exploration, are: 



E [log t] = - 1/N, a [log t) = 1 /N. 



(8) 



Since each value of log t is independent, after i iterations the prior 
volume will shrink down such that log X\ ~ — (i ± ^/i) /TV. Thus, 
one takes X\ = exp(— i/N). 

The nested sampling algorithm is terminated when the evi- 
dence has been computed to a pre-specified precision. The evidence 
that could be contributed by the remaining live points is estimated 
as AZi — C m axXi, where £ max is the maximum-likelihood value 
of the remaining live ppoints, and Xi is the remaining prior volume. 
The algorithm terminates when AZi is less than a user-defined 
value (we use 0.5 in log-evidence). 
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Once the evidence Z is found, posterior inferences can be eas- 
ily generated using the final live points and the full sequence of 
discarded points from the nested sampling process, i.e., the points 
with the lowest likelihood value at each iteration i of the algorithm. 
Each such point is simply assigned the probability weight 



Z 



(9) 



These samples can then be used to calculate inferences of posterior 
parameters such as means, standard deviations, covariances and so 
on, or to construct marginalised posterior distributions. 

3.2 The MultiNest Algorithm 

The most challenging task in implementing nested sampling is to 
draw samples from the prior within the hard co nstraint C > d 
at eac h iteration i. The M ultiNest algorithm iFeroz & Hobsonl 
J2008h ; Fero z et al. (2009c) tackles this problem through an ellip- 
soidal rejection sampling scheme. The live point set is enclosed 
within a set of (possibly overlapping) ellipsoids and a new point is 
then drawn uniformly from the region enclosed by these ellipsoids. 
The ellipsoidal decomposition of the live point set is chosen to min- 
imize the sum of volumes of the ellipsoids. The ellipsoidal decom- 
position is well suited to dealing with posteriors that have curving 
degeneracies, and allows mode identification in multi-modal poste- 
riors. If there are subsets of the ellipsoid set that do not overlap with 
the remaining ellipsoids, these are identified as a distinct mode and 
subsequently evolved independently. 

The MultiNest algorithm has proven very useful for 
tackling inference problems in cosmology and pa r ticle physics 
(see e g. ISekiguchi et al] 120091: bridges et all 120091 : IVegetti etai] 



2009e, 20081 2009d: IFeroz et al. 




2008a; 


Trotta et al. 


2008; 


Lopez-Fogliani et al. 2009; Raklev & White 


2009; 


Trotta et al. 



magnitude improvement in efficiency over conventional tech- 
niques. More recently, it has been shown to perform well as a search 
tool for gravitational wave data analysis jFeroz et alj|2009bll3) . 



4 DETAILED CRITIQUE OF VAN HAASTEREN (2009) 

In Ivan Haasteren (2009), the author proposes a method based on 
Vornoi tessellation to calculate the Bayesian evidence (Eq. O as 
follows: 



Z 



(10) 



where the index i iterates over the samples (may be obtained 
through an MCMC algorithm) in the parameter space and Oi is 
the area of the Vornoi cell occupied by the i th sample. Although 
this approximation converges to the true evidence value, it can not 
be used in practice because of the high computational cost involved 
in the calculation of Vornoi tesselation. In order to overcome this 
problem, the author suggests to concentrate on a small subset of 
samples Ft , around the peak of the distribution, occupying volume 
Vt and exploits the following relationship: 



(id 



©iSJt 



where a is a proportionality constant to be determined. In order to 
calculate a, the author then suggests to make a Gaussian approx- 
imation around the peak so that the volume Vt can be calculated 



V t = 



r(i + f] 



(12) 



where D is the dimensionality of the problem, C the covariance 
matrix and r is the Mahalanobis distance from the peak of the point 
with the lowest likelihood value inside the region Ft . Once a has 
been calculated using Eqs.dl It and &121 , The evidence can be cal- 
culated as: 



Z = Na, 



(13) 



where N is the total number samples inside the region Ft . The au- 
thor then applies this method to a few toy problems and compares it 
with MultiNest, attempting to show that it outperforms MULTI- 
NEST in terms of accuracy as well as computational efficiency. 

We bel ieve that the comparis on of MULTINEST to the method 
proposed in van Haasteren ( 2009) is completely unfair, as it makes 
a very strong assumption about the availability of suitable poste- 
rior samples. Moreover, the criticism of nested sampling algorithm 
is due to author's misunderstanding about the method. We discuss 
these in detail now. 



4.1 Nested Sampling solves the full inference problem 

The method proposed in van H aasteren! {2009) and briefly sum- 
marized in the previous section requires the region Ft to be suf- 
ficiently and adequately populated by MCMC samples but does 
not propose an MCMC algorithm to satisfy this condition. There- 
fore, the proposed method does not provide a complete solution 
for Bayesian inference problem, but it actually makes the strong 
assumption that such a solution has already been implemented us- 
ing another technique. For many problems of interest (e.g. problem 
exhibiting strong, curving degeneracies and/or multi-modality) ad- 
equately sampling the parameter space is a difficult problem which 
often proves to be a stumbling block e ven for the parameter in- 
ference step. The algorithm proposed by Ivan Haasterenl fc009h as- 
sumes that this part of the problem has already been solved, which 
strongly limits the usefulness of the suggested method if no effi- 
cient way of gathering posterior samples is put forward. 

The MULTINEST algorithm, on the other hand, provides a 
means to carry out both parameter exploration and the evidence 
evaluation in an efficient and robust way. This has been widely 
demonstrated by applying it very successfully to various real in- 
ference problems in astroph ysics, cosmology and particle physics 
phenomenology (s ee e. 



Sekiguchi et al. 2009; Bridges et al 



|2009l; IVegetti et al.1 12009];' ISollom et alj 120091; lAbdusSalam et al 
| 2009bllat IFeroz et all p2009e1. 120081. |2009d: IFeroz et alj 12008 
Feroz et al]2009bllal:lTrotta et al]2008ML6pez-Fogriani et alj200' 
kaklev & Whitel2009h . 

The eggbox problem (discussed in Sec. 5.2 of Ivan Haasterenl 

l2009h is a particular example of what we believe is an unfair com- 
parison. The author assumes that a suitable trick can be found en- 
abling the MCMC algorithm to explore all the modes adequately 
so that his proposed method can be applied to this highly multi- 
modal problem, while MultiNest not only finds all the modes 
without making any assumptions but also calculates the evidence 
accurately. 

A fair comparison in our opinion would require the specifica- 
tion of another algorithm capable of providing not only the means 
to calculate the evidence but also to explore the parameter space 
without making too many assumptions. In our view, the compari- 
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son between the method of lvan Haasterenl d2009h and MultiNest 
is fundamentally unfair, as the latter has a much wider applicability 
and solves the full inference problem from scratch. 

One of the most attractive features of MultiNest is its gen- 
eral applicability for moderately dimensional but highly complex 
problems. It is completely application-independent in the sense 
that the specific problem being tackled enters only through the like- 
lihood computation, and does not change how the live point set 
is updated. It makes no assumptions about the number and nature 
of modes. Even if a perfect MCMC samp ler were available, the 
method proposed in Ivan Haasterenl (2009) although can be quite 
useful for uni-modal Gaussian problems, would nonetheless fail 
for highly non-Gaussia n problems. The G aussian-shell problem 
discussed in Sec. 6.2 of iFeroz et al.l |2009c) would perhaps be the 
most obvious example. Comparing MultiNest with the proposed 
method on problems ideally suited to the proposed method is an- 
other reason why we think the comparison is completely unfair. 

4.2 Misconceptions about Nested Sampling 

In the introduction, the author argues that MultiNest (and conse- 
quently nested sampling) algorithm by design, samples from the 
prior distribution and not the posterior distribution, and conse- 
quently suffers more with the 'curse of dimensionality' than the 
traditional MCMC based algorithms. This statement ignores the 
fact that nested sampling does not sample from the prior distribu- 
tion blindly, it samples from the prior within the hard constraint 
C > d at each iteration i. As discussed in Sec. 13.11 this hard- 
edge sampling scheme results in exponential decrease in the prior 
volume X; occupied by the live points at the i th iteration with the 
expected value of Xi given as, 

< Xi >= exp(-i/N), (14) 

N being the total number of live points. This allows the algorithm 
to reach highly localized regions of the parameter space with large 
likelihood values in a reasonable number of iterations. We can not 
see why such a sampling scheme would suffer from the 'curse of 
dimensionality' more than the traditional MCMC schemes. 

In our opinion, the information content H, given as follows, 

H - 1 iog (§) dx > (i5) 

where P denotes the posterior, is more important than the di- 
mensionality of the problem. Most of the contribution to the ev- 
idence value usually comes from the iterations around the maxi- 
mum likelihood point, which occurs in the region with prior vol- 
ume X « e~ H and therefore because of the exponential shrinkage 
of the prior volume, one could argue that nested sampling has an 
inherent advantage over MCMC schemes. It should be noted that 
nested sampling framework itself leaves it to the user to design a 
scheme to sample from the iso-likelihood contour and it is cer- 
tainly possible to come up with highly inefficient schemes, to sam- 
ple a new point with C > d, suffering severely with the 'curse of 
dimensionality'. 

In Sec. 4.2, the author claims that nested sampling generates 
samples from the whole of the parameter space, rather than from 
the posterior distribution, and therefore it can never reach the ef- 
ficiency achieved by the traditional MCMC based methods. This 
statement ignores two key points. First, as discussed earlier in this 
section, nested sampling does not sample from the whole of the 
prior distribution, but instead it samples from the hard-edge region 
inside the prior whose volume is reduced exponentially with each 



iteration. Secondly, the samples generated by a nested sampling al- 
gorithm are not all equally weighted as the ones generated through 
an MCMC method are. As discussed in Sec. 13. II one needs to as- 
sign a probability weight given by Eq. {9]l to the point with lowest 
likelihood value at each iteration of the nested sampling algorithm. 
In Fig. 3 o flvan Haasterenl d2009h . the author shows a scatter plot of 
40,000 samples obtained through a nested sampling algorithm. The 
author does not mention whether he has plotted simply the points 
with the lowest likelihood values at each iteration or the probabil- 
ity weights of these points have also been taken into account and 
therefore we are unable to comment on the accuracy of the figure. 

4.3 Curse of Dimensionality 

The author repeatedly attacks nested sampling algorithm in general 
and MultiNest in particular, saying that it suffers severely from 
the 'curse of dimensionality'. We have already discussed in the pre- 
vious section that as far as the general nested sampling framework 
is concerned, this is not true. 

MultiNest implements the nested sampling algorithm 
through an ellipsoidal rejection sampling scheme to sample uni- 
formly from the iso-likelihood contour as discussed in Sec. 13.21 It 
is well known that all rejection sampling scheme s are highly in - 
efficient for high dimensional problems (see e.g. lMackavll2003l) . 
MultiNest was designed to work with problems with moderately 
high number of dimensions and it has proven highly successful to 
deal with highly multi-modal and complex problems in cosmology 
and particle physics phenomenology. It should be noted that nested 
sampling not only provides a way to evaluate the Bayesian evidence 
accurately but with a clever algorithm to sample from the hard con- 
straint, it also provides a solution to deal with highly degenerate 
and multi-modal problems. A very good example is the use of 
Mult iNest in gravitational wave astronomy (see e.g. lFeroz et al.l 
2009b. a) where the problems are inherently multi-modal and so far 
MultiNest has mainly been used as a parameter exploration tool 
with great deal of success. 

For parameter exploration, MultiNest works with reason- 
able efficiency up to ~ 100D beyond which the efficiency drops 
appreciably as expected. Accurate evidence evaluation requires ad- 
equate sampling from the whole of parameter space and even slight 
inaccuracies in estimating the iso-likelihood region through the el- 
lipsoidal decomposition can result in large inaccuracies and there- 
fore MultiNest can calculate the evidence value accurately with 
reasonable efficiency for problems up to ~ 50D. In order to demon- 
strate this, we ran MultiNest on an ro-dimensional ellipsoidal 
Gaussian with likelihoodQ, 

'■n^-(-¥). «» 

with 

<Ji = O.OOli. (17) 

We set uniform prior W(0, 1) for all the parameters so that the an- 
alytical log-evidence value is 0.0 regardless of the dimensionality 
of the problem. We used 1,000 live points with target efficiency e 
set to 1.0 for the 2D problem and reducing it to 0.01 for the 32D 

1 We would have lik ed to test MULTINEST on the same toy problem de- 
scribed in Sec. 5.1 of van Haasteren ( 20091) but we were unable to do so 
as the author did not describe the prior distribution he used and hence we 
chose a similar but not exactly the same problem. 
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n 


Mike 


log(JZ) 


H 


2 


14, 184 


0.04 ±0.10 


10.00 


4 


26, 033 


-0.07 ±0.14 


19.60 


8 


107, 876 


-0.24 ±0.18 


32.40 


16 


651,345 


0.11 ± 0.24 


57.60 


32 


14, 134,227 


0.40 ±0.31 


96.10 



Table 1. The log-evidence (log(-Z)) and information content (H) values 
obtained by the MULTlNEST algorithm when applied to the problem de- 
scribed in Eq. (16) . The analytical \og(Z) is 0.0 regardless of dimensional- 
ity n. 

problem. A value of e = 0.3 was suggested in lFeroz et al. (2009c) 
for the standard CMB data analysis, but it was accompanied by the 
caveat that the user should check that the evidence value is con- 
sistent when e is lowered. We list the number of likelihood eval- 
uations, recovered log(.Z) and information content H in Table Q] 
These results clearly show that MULTlNEST is able to correctly 
evaluate the evidence values even for the 32D problem with the 
posterior occupying only e -96 10 of the prior volume, although the 
efficiency does drop appreciably with the increase in dimension- 
ality of the problem. This drop in efficiency is mainly due to the 
exponential increase in the information content. 

We should also mention that for a Gaussian problem, we have 
been able to calculate the evidence value accurately up to ~ 1000D 
using nested sampling with a Hamiltonian sampling scheme to 
sample from the hard-constraint. This method is still under devel- 
opment and we would present the results in a forthcoming publica- 
tion. 



5 CONCLUSIONS 

MULTlNEST has proven to be a very useful and powerful tool to 
carry out Bayesian inference for a wide variety of problems in cos- 
mology and particle physics phenomenology as well as for gen- 
eral modera tely dimensional infe rence problems. While the method 
proposed in van Haasteren 2009 to calculate the Bayesian evidence 
is interesting for sufficiently simple problems, it does not have 
general applicability and relies on the availability of samples dis- 
tributed according to the posterior distribution. The comparison of 
this method with MULTlNEST is completely unfair as MULTlNEST 
provides the means to perform full Bayesian analysis without mak- 
ing any assumptions about the nature of the problem nor does it 
rely on the availability of posterior samples, in fact it can be used 
to provide the posterior samples. 
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